Improving k-Nearest Neighbour Classification with Distance Functions Based on Receiver Operating Characteristics

نویسندگان

  • Md. Rafiul Hassan
  • M. Maruf Hossain
  • James Bailey
  • Kotagiri Ramamohanarao
چکیده

The k-nearest neighbour (k-NN) technique, due to its interpretable nature, is a simple and very intuitively appealing method to address classification problems. However, choosing an appropriate distance function for k-NN can be challenging and an inferior choice can make the classifier highly vulnerable to noise in the data. In this paper, we propose a new method for determining a good distance function for k-NN. Our method is based on consideration of the area under the Receiver Operating Characteristics (ROC) curve, which is a well knownmethod tomeasure the quality of binary classifiers. It computes weights for the distance function, based on ROC properties within an appropriate neighbourhood for the instances whose distance is being computed. We experimentally compare the effect of our scheme with a number of other well-known k-NN distance metrics, as well as with a range of different classifiers. Experiments show that our method can substantially boost the classification performance of the k-NN algorithm. Furthermore, in a number of cases our technique is even able to deliver better accuracy than state-of-the-art non k-NN classifiers, such as support vector machines.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Class-Based Attribute Weighting for Time Series Classification

In this paper, we present two novel class-based weighting methods for the Euclidean nearest neighbor algorithm and compare them with global weighting methods considering empirical results on a widely accepted time series classification benchmark dataset. Our methods provide higher accuracy than every global weighting in nearly half of the cases and they have better overall performance. We concl...

متن کامل

Nearest Neighbour Distance Matrix Classification

A distance based classification is one of the popular methods for classifying instances using a point-to-point distance based on the nearest neighbour or k-NEAREST NEIGHBOUR (k-NN). The representation of distance measure can be one of the various measures available (e.g. Euclidean distance, Manhattan distance, Mahalanobis distance or other specific distance measures). In this paper, we propose ...

متن کامل

Hesitant Fuzzy k-Nearest Neighbour (HFK-NN) Classifier for Document Classification and Numerical Result Analysis

This paper presents new approach Hesitant Fuzzy K-nearest neighbour (HFK-nn) based document classification and numerical results analysis. The proposed classification Hesitant Fuzzy K-nearest neighbour (HFKnn) approach is based on hesitant Fuzzy distance. In this paper we have used hesitant Fuzzy distance calculations for document classification results. The following steps are used for classif...

متن کامل

An Empirical Comparison of Weighting Functions for Multi-label Distance- Weighted K-nearest Neighbour Method

Multi-label classification is an extension of classical multi-class one, where any instance can be associated with several classes simultaneously and thus the classes are no longer mutually exclusive. It was experimentally shown that the distance-weighted k-nearest neighbour (DWkNN) algorithm is superior to the original kNN rule for multi-class learning. But, it has not been investigated whethe...

متن کامل

Some improvements on NN based classifiers in metric spaces

The nearest neighbour (NN) and k-nearest neighbour (k-NN) classification rules have been widely used in Pattern Recognition due to its simplicity and good behaviour. Exhaustive nearest neighbour search may become unpractical when facing large training sets, high dimensional data or expensive dissimilarity measures (distances). During the last years a lot of fast NN search algorithms have been d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008